ALEVS: Active Learning by Statistical Leverage Sampling
نویسندگان
چکیده
Active learning aims to obtain a classifier of high accuracy by using fewer label requests in comparison to passive learning by selecting effective queries. Many active learning methods have been developed in the past two decades, which sample queries based on informativeness or representativeness of unlabeled data points. In this work, we explore a novel querying criterion based on statistical leverage scores. The statistical leverage scores of a row in a matrix are the squared row-norms of the matrix containing its (top) left singular vectors and is a measure of influence of the row on the matrix. Leverage scores have been used for detecting high influential points in regression diagnostics (Chatterjee & Hadi, 1986) and have been recently shown to be useful for data analysis (Drineas et al., 2008) and randomized low-rank matrix approximation algorithms (Gittens & Mahoney, 2013). We explore how sampling data instances with high statistical leverage scores perform in active learning. Our empirical comparison on several binary classification datasets indicate that querying high leverage points is an effective strategy.
منابع مشابه
An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection
In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhi...
متن کاملFast Randomized Kernel Methods With Statistical Guarantees
One approach to improving the running time of kernel-based machine learning methods is to build a small sketch of the input and use it in lieu of the full kernel matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance. By extending the notion of statistical...
متن کاملEmpirical Comparison of Active Learning Strategies for Handling Temporal Drift
Active learning strategies often assume that the target concept will remain stationary over time. However, in many real world systems, it is not uncommon for the target concept and distribution properties of the generated data to change over time. This paper presents an empirical study that evaluates the effectiveness of using active learning strategies to train statistical models in the presen...
متن کاملRidge Regression and Provable Deterministic Ridge Leverage Score Sampling
Ridge leverage scores provide a balance between low-rank approximation and regularization, and are ubiquitous in randomized linear algebra and machine learning. Deterministic algorithms are also of interest in the moderately big data regime, because deterministic algorithms provide interpretability to the practitioner by having no failure probability and always returning the same results. We pr...
متن کاملFast Randomized Kernel Ridge Regression with Statistical Guarantees
One approach to improving the running time of kernel-based methods is to build a small sketch of the kernel matrix and use it in lieu of the full matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance. By extending the notion of statistical leverage scores...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1507.04155 شماره
صفحات -
تاریخ انتشار 2015